Goto

Collaborating Authors

 science and technology


fMRI2GES: Co-speech Gesture Reconstruction from fMRI Signal with Dual Brain Decoding Alignment

Zhu, Chunzheng, Shao, Jialin, Lin, Jianxin, Wang, Yijun, Wang, Jing, Tang, Jinhui, Li, Kenli

arXiv.org Artificial Intelligence

Understanding how the brain responds to external stimuli and decoding this process has been a significant challenge in neuroscience. While previous studies typically concentrated on brain-to-image and brain-to-language reconstruction, our work strives to reconstruct gestures associated with speech stimuli perceived by brain. Unfortunately, the lack of paired \{brain, speech, gesture\} data hinders the deployment of deep learning models for this purpose. In this paper, we introduce a novel approach, \textbf{fMRI2GES}, that allows training of fMRI-to-gesture reconstruction networks on unpaired data using \textbf{Dual Brain Decoding Alignment}. This method relies on two key components: (i) observed texts that elicit brain responses, and (ii) textual descriptions associated with the gestures. Then, instead of training models in a completely supervised manner to find a mapping relationship among the three modalities, we harness an fMRI-to-text model, a text-to-gesture model with paired data and an fMRI-to-gesture model with unpaired data, establishing dual fMRI-to-gesture reconstruction patterns. Afterward, we explicitly align two outputs and train our model in a self-supervision way. We show that our proposed method can reconstruct expressive gestures directly from fMRI recordings. We also investigate fMRI signals from different ROIs in the cortex and how they affect generation results. Overall, we provide new insights into decoding co-speech gestures, thereby advancing our understanding of neuroscience and cognitive science.


Advancing Autonomous Driving: DepthSense with Radar and Spatial Attention

Hussain, Muhamamd Ishfaq, Naz, Zubia, Rafique, Muhammad Aasim, Jeon, Moongu

arXiv.org Artificial Intelligence

Depth perception is crucial for spatial understanding and has traditionally been achieved through stereoscopic imaging. However, the precision of depth estimation using stereoscopic methods depends on the accurate calibration of binocular vision sensors. Monocular cameras, while more accessible, often suffer from reduced accuracy, especially under challenging imaging conditions. Optical sensors, too, face limitations in adverse environments, leading researchers to explore radar technology as a reliable alternative. Although radar provides coarse but accurate signals, its integration with fine-grained monocular camera data remains underexplored. In this research, we propose DepthSense, a novel radar-assisted monocular depth enhancement approach. DepthSense employs an encoder-decoder architecture, a Radar Residual Network, feature fusion with a spatial attention mechanism, and an ordinal regression layer to deliver precise depth estimations. We conducted extensive experiments on the nuScenes dataset to validate the effectiveness of DepthSense. Our methodology not only surpasses existing approaches in quantitative performance but also reduces parameter complexity and inference times. Our findings demonstrate that DepthSense represents a significant advancement over traditional stereo methods, offering a robust and efficient solution for depth estimation in autonomous driving. By leveraging the complementary strengths of radar and monocular camera data, DepthSense sets a new benchmark in the field, paving the way for more reliable and accurate spatial perception systems.


Roundtables: Surviving the New Age of Conspiracies

MIT Technology Review

Watch a subscriber-only conversation unpacking our new series, "The New Conspiracy Age," and how this moment is changing science and technology. Everything is a conspiracy theory now. Watch a discussion with our editors and Mike Rothschild, journalist and conspiracy theory expert, about how we can make sense of them all. What it's like to be in the middle of a conspiracy theory (according to a conspiracy theory expert) It's surprisingly easy to stumble into a relationship with an AI chatbot Rhiannon Williams OpenAI's new LLM exposes the secrets of how AI really works Will Douglas Heaven It's surprisingly easy to stumble into a relationship with an AI chatbot The idea that machines will be as smart as--or smarter than--humans has hijacked an entire industry. But look closely and you'll see it's a myth that persists for many of the same reasons conspiracies do. The experimental model won't compete with the biggest and best, but it could tell us why they behave in weird ways--and how trustworthy they really are.


Adaptive Redundancy Regulation for Balanced Multimodal Information Refinement

Yang, Zhe, Li, Wenrui, Chen, Hongtao, Wang, Penghong, Xiong, Ruiqin, Fan, Xiaopeng

arXiv.org Artificial Intelligence

Abstract--Multimodal learning aims to improve performance by leveraging data from multiple sources. During joint multi-modal training, due to modality bias, the advantaged modality often dominates backpropagation, leading to imbalanced optimization. Existing methods still face two problems: First, the long-term dominance of the dominant modality weakens representation-output coupling in the late stages of training, resulting in the accumulation of redundant information. Second, previous methods often directly and uniformly adjust the gradients of the advantaged modality, ignoring the semantics and directionality between modalities. T o address these limitations, we propose Adaptive Redundancy Regulation for Balanced Multimodal Information Refinement (RedReg), which is inspired by information bottleneck principle. Specifically, we construct a redundancy phase monitor that uses a joint criterion of effective gain growth rate and redundancy to trigger intervention only when redundancy is high. Furthermore, we design a co-information gating mechanism to estimate the contribution of the current dominant modality based on cross-modal semantics. When the task primarily relies on a single modality, the suppression term is automatically disabled to preserve modality-specific information. Finally, we project the gradient of the dominant modality onto the orthogonal complement of the joint multi-modal gradient subspace and suppress the gradient according to redundancy. Experiments show that our method demonstrates superiority among current major methods in most scenarios. Ablation experiments verify the effectiveness of our method. The code is available at https://github.com/xia-zhe/RedReg.git Index T erms--Multimodal learning, modality imbalance, information bottleneck This work was supported in part by the National Key R&D Program of China (2023YFA1008501) and the National Natural Science Foundation of China (NSFC) under grant 624B2049 and U22B2035. Wenrui Li, Penghong Wang, and Xiaopeng Fan are with the Department of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China, and also with Harbin Institute of Technology Suzhou Research Institute, Suzhou 215104, China. Hongtao Chen is with the School of Mathematical Sciences, University of Electronic Science and Technology of China, Chengdu, Sichuan 611731, China (e-mail: ht166chen@163.com).



KGFR: A Foundation Retriever for Generalized Knowledge Graph Question Answering

Cui, Yuanning, Sun, Zequn, Hu, Wei, Fu, Zhangjie

arXiv.org Artificial Intelligence

Large language models (LLMs) excel at reasoning but struggle with knowledge-intensive questions due to limited context and parametric knowledge. However, existing methods that rely on finetuned LLMs or GNN retrievers are limited by dataset-specific tuning and scalability on large or unseen graphs. We propose the LLM-KGFR collaborative framework, where an LLM works with a structured retriever, the Knowledge Graph Foundation Retriever (KGFR). KGFR encodes relations using LLM-generated descriptions and initializes entities based on their roles in the question, enabling zero-shot generalization to unseen KGs. To handle large graphs efficiently, it employs Asymmetric Progressive Propagation (APP)- a stepwise expansion that selectively limits high-degree nodes while retaining informative paths. Through node-, edge-, and path-level interfaces, the LLM iteratively requests candidate answers, supporting facts, and reasoning paths, forming a controllable reasoning loop. Experiments demonstrate that LLM-KGFR achieves strong performance while maintaining scalability and generalization, providing a practical solution for KG-augmented reasoning.


Parameter Interpolation Adversarial Training for Robust Image Classification

Liu, Xin, Yang, Yichen, He, Kun, Hopcroft, John E.

arXiv.org Artificial Intelligence

Though deep neural networks exhibit superior performance on various tasks, they are still plagued by adversarial examples. Adversarial training has been demonstrated to be the most effective method to defend against adversarial attacks. However, existing adversarial training methods show that the model robustness has apparent oscillations and overfitting issues in the training process, degrading the defense efficacy. To address these issues, we propose a novel framework called Parameter Interpolation Adversarial Training (PIAT). PIAT tunes the model parameters between each epoch by interpolating the parameters of the previous and current epochs. It makes the decision boundary of model change more moderate and alleviates the overfitting issue, helping the model converge better and achieving higher model robustness. In addition, we suggest using the Normalized Mean Square Error (NMSE) to further improve the robustness by aligning the relative magnitude of logits between clean and adversarial examples rather than the absolute magnitude. Extensive experiments conducted on several benchmark datasets demonstrate that our framework could prominently improve the robustness of both Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs).


The Download: Introducing: the new conspiracy age

MIT Technology Review

Everything is a conspiracy theory now. Conspiracists are all over the White House, turning fringe ideas into dangerous policy. America's institutions are crumbling under the weight of deep suspicion and the lasting effects of covid isolation. Online echo chambers are getting harder to escape, and generative AI is altering the fabric of truth. A mix of technology and politics has given an unprecedented boost to once-fringe ideas--but they are pretty much the same fantasies that have been spreading for hundreds of years. MIT Technology Review helps break down how this moment is changing science and technology--and how we can make it through.


Global PIQA: Evaluating Physical Commonsense Reasoning Across 100+ Languages and Cultures

Chang, Tyler A., Arnett, Catherine, Eldesokey, Abdelrahman, Sadallah, Abdelrahman, Kashar, Abeer, Daud, Abolade, Olanihun, Abosede Grace, Mohammed, Adamu Labaran, Praise, Adeyemi, Sharma, Adhikarinayum Meerajita, Gupta, Aditi, Iyigun, Afitab, Simplício, Afonso, Essouaied, Ahmed, Chorana, Aicha, Eppa, Akhil, Oladipo, Akintunde, Ramesh, Akshay, Dorkin, Aleksei, Kondoro, Alfred Malengo, Aji, Alham Fikri, Çetintaş, Ali Eren, Hanbury, Allan, Dembele, Alou, Niksarli, Alp, Arroyo, Álvaro, Bajand, Amin, Khanna, Amol, Chkhaidze, Ana, Condez, Ana, Mkhonto, Andiswa, Hoblitzell, Andrew, Tran, Andrew, Poulis, Angelos, Majumder, Anirban, Vacalopoulou, Anna, Wong, Annette Kuuipolani Kanahele, Simonsen, Annika, Kovalev, Anton, S, Ashvanth., Lana, Ayodeji Joseph, Kinay, Barkin, Alhafni, Bashar, Busole, Benedict Cibalinda, Ghanem, Bernard, Nathani, Bharti, Đurić, Biljana Stojanovska, Agbonile, Bola, Bergsson, Bragi, Fischer, Bruce Torres, Tutar, Burak, Çınar, Burcu Alakuş, Kane, Cade J. Kanoniakapueo, Udomcharoenchaikit, Can, Arnett, Catherine, Helwe, Chadi, Nerella, Chaithra Reddy, Liu, Chen Cecilia, Nwokolo, Chiamaka Glory, España-Bonet, Cristina, Amol, Cynthia, Lee, DaeYeop, Arad, Dana, Dzenhaliou, Daniil, Pugacheva, Daria, Choi, Dasol, Abolade, Daud, Liu, David, Semedo, David, Popoola, Deborah, Mataciunas, Deividas, Nyaboke, Delphine, Kumar, Dhyuthy Krishna, Glória-Silva, Diogo, Tavares, Diogo, Goyal, Divyanshu, Lee, DongGeon, Anajemba, Ebele Nwamaka, Grace, Egonu Ngozi, Mickel, Elena, Tutubalina, Elena, Herranen, Elias, Anand, Emile, Habumuremyi, Emmanuel, Ajiboye, Emuobonuvie Maria, Yulianrifat, Eryawan Presma, Adenuga, Esther, Rudnicka, Ewa, Itiola, Faith Olabisi, Butt, Faran Taimoor, Thekkekara, Fathima, Haouari, Fatima, Tjiaranata, Filbert Aurelian, Laakom, Firas, Grasso, Francesca, Orabona, Francesco, Periti, Francesco, Solomon, Gbenga Kayode, Ngo, Gia Nghia, Udhehdhe-oze, Gloria, Martins, Gonçalo, Challagolla, Gopi Naga Sai Ram, Son, Guijin, Abdykadyrova, Gulnaz, Einarsson, Hafsteinn, Hu, Hai, Saffari, Hamidreza, Zaidi, Hamza, Zhang, Haopeng, Shairah, Harethah Abu, Vuong, Harry, Kuulmets, Hele-Andra, Bouamor, Houda, Yu, Hwanjo, Debess, Iben Nyholm, Deveci, İbrahim Ethem, Hanif, Ikhlasul Akmal, Cho, Ikhyun, Calvo, Inês, Vieira, Inês, Manzi, Isaac, Daud, Ismail, Itzhak, Itay, Iuliia, null, Alekseenko, null, Belashkin, Ivan, Spada, Ivan, Zhelyazkov, Ivan, Brinton, Jacob, Isbarov, Jafar, Čibej, Jaka, Čuhel, Jan, Kocoń, Jan, Krito, Jauza Akbar, Purbey, Jebish, Mickel, Jennifer, Za, Jennifer, Kunz, Jenny, Jeong, Jihae, Dávalos, Jimena Tena, Lee, Jinu, Magalhães, João, Yi, John, Kim, Jongin, Chataignon, Joseph, Imperial, Joseph Marvin, Thevakumar, Jubeerathan, Land, Judith, Jiang, Junchen, Kim, Jungwhan, Sirts, Kairit, R, Kamesh, V, Kamesh, Tshinu, Kanda Patrick, Kukk, Kätriin, Ponkshe, Kaustubh, Huseynova, Kavsar, He, Ke, Buchanan, Kelly, Sarveswaran, Kengatharaiyer, Zaman, Kerem, Mrini, Khalil, Kyars, Kian, Kruusmaa, Krister, Chouhan, Kusum, Krishnakumar, Lainitha, Sánchez, Laura Castro, Moscoso, Laura Porrino, Choshen, Leshem, Sencan, Levent, Øvrelid, Lilja, Alazraki, Lisa, Ehimen-Ugbede, Lovina, Thevakumar, Luheerathan, Thavarasa, Luxshan, Malik, Mahnoor, Keita, Mamadou K., Jangid, Mansi, De Santis, Marco, García, Marcos, Suppa, Marek, D'Ciofalo, Mariam, Ojastu, Marii, Sikander, Maryam, Narayan, Mausami, Skandalis, Maximos, Mehak, Mehak, Bozkurt, Mehmet İlteriş, Workie, Melaku Bayu, Velayuthan, Menan, Leventhal, Michael, Marcińczuk, Michał, Potočnjak, Mirna, Shafiei, Mohammadamin, Sharma, Mridul, Indoria, Mrityunjaya, Habibi, Muhammad Ravi Shulthan, Kolić, Murat, Galant, Nada, Permpredanun, Naphat, Maugin, Narada, Corrêa, Nicholas Kluge, Ljubešić, Nikola, Thomas, Nirmal, de Silva, Nisansa, Joshi, Nisheeth, Ponkshe, Nitish, Habash, Nizar, Udeze, Nneoma C., Thomas, Noel, Ligeti-Nagy, Noémi, Coulibaly, Nouhoum, Faustin, Nsengiyumva, Buliaminu, Odunayo Kareemat, Ogundepo, Odunayo, Fejiro, Oghojafor Godswill, Funmilola, Ogundipe Blessing, God'spraise, Okechukwu, Samuel, Olanrewaju, Oluwaseun, Olaoye Deborah, Akindejoye, Olasoji, Popova, Olga, Snissarenko, Olga, Chiemezie, Onyinye Anulika, Kinay, Orkun, Tursun, Osman, Moses, Owoeye Tobiloba, Joshua, Oyelade Oluwafemi, Fiyinfoluwa, Oyesanmi, Gamallo, Pablo, Fernández, Pablo Rodríguez, Arora, Palak, Valente, Pedro, Rupnik, Peter, Ekiugbo, Philip Oghenesuowho, Sahoo, Pramit, Prokopidis, Prokopis, Niau-Puhipau, Pua, Yahya, Quadri, Mignone, Rachele, Singhal, Raghav, Kadiyala, Ram Mohan Rao, Merx, Raphael, Afolayan, Rapheal, Rajalakshmi, Ratnavel, Ghosh, Rishav, Oji, Romina, Solis, Ron Kekeha, Guerra, Rui, Zawar, Rushikesh, Bashir, Sa'ad Nasir, Alzaabi, Saeed, Sandeep, Sahil, Batchu, Sai Pavan, Kantareddy, SaiSandeep, Pranida, Salsabila Zahirah, Buchanan, Sam, Rutunda, Samuel, Land, Sander, Sulollari, Sarah, Ali, Sardar, Sapkota, Saroj, Tautvaisas, Saulius, Sen, Sayambhu, Banerjee, Sayantani, Diarra, Sebastien, M, SenthilNathan., Lee, Sewoong, Shah, Shaan, Venkitachalam, Shankar, Djurabaeva, Sharifa, Ibejih, Sharon, Dutta, Shivanya Shomir, Gupta, Siddhant, Suárez, Silvia Paniagua, Ahmadi, Sina, Sukumar, Sivasuthan, Song, Siyuan, A., Snegha, Sofianopoulos, Sokratis, Simon, Sona Elza, Benčina, Sonja, Gvasalia, Sophie, More, Sphurti Kirit, Dragazis, Spyros, Kaufhold, Stephan P., S, Suba., AlRashed, Sultan, Ranathunga, Surangika, Someya, Taiga, Pungeršek, Taja Kuzman, Haklay, Tal, Jibril, Tasi'u, Aoyama, Tatsuya, Abashidze, Tea, Cruz, Terenz Jomar Dela, Blevins, Terra, Nikas, Themistoklis, Idoko, Theresa Dora, Do, Thu Mai, Chubakov, Tilek, Gargiani, Tommaso, Rathore, Uma, Johannesen, Uni, Ugwu, Uwuma Doris, Putra, Vallerie Alexandra, Kumar, Vanya Bannihatti, Jeyarajalingam, Varsha, Arzt, Varvara, Nedumpozhimana, Vasudevan, Ondrejova, Viktoria, Horbik, Viktoryia, Kummitha, Vishnu Vardhan Reddy, Dinić, Vuk, Sewunetie, Walelign Tewabe, Wu, Winston, Zhao, Xiaojing, Diarra, Yacouba, Nikankin, Yaniv, Mathur, Yash, Chen, Yixi, Li, Yiyuan, Xavier, Yolanda, Belinkov, Yonatan, Abayomi, Yusuf Ismail, Alyafeai, Zaid, Shan, Zhengyang, Tam, Zhi Rui, Tang, Zilu, Nadova, Zuzana, Abbasi, Baber, Biderman, Stella, Stap, David, Ataman, Duygu, Schmidt, Fabian, Gonen, Hila, Wang, Jiayi, Adelani, David Ifeoluwa

arXiv.org Artificial Intelligence

To date, there exist almost no culturally-specific evaluation benchmarks for large language models (LLMs) that cover a large number of languages and cultures. In this paper, we present Global PIQA, a participatory commonsense reasoning benchmark for over 100 languages, constructed by hand by 335 researchers from 65 countries around the world. The 116 language varieties in Global PIQA cover five continents, 14 language families, and 23 writing systems. In the non-parallel split of Global PIQA, over 50% of examples reference local foods, customs, traditions, or other culturally-specific elements. We find that state-of-the-art LLMs perform well on Global PIQA in aggregate, but they exhibit weaker performance in lower-resource languages (up to a 37% accuracy gap, despite random chance at 50%). Open models generally perform worse than proprietary models. Global PIQA highlights that in many languages and cultures, everyday knowledge remains an area for improvement, alongside more widely-discussed capabilities such as complex reasoning and expert knowledge. Beyond its uses for LLM evaluation, we hope that Global PIQA provides a glimpse into the wide diversity of cultures in which human language is embedded.


Input Domain Aware MoE: Decoupling Routing Decisions from Task Optimization in Mixture of Experts

Hua, Yongxiang, Cao, Haoyu, Tao, Zhou, Li, Bocheng, Wu, Zihao, Liu, Chaohu, Xu, Linli

arXiv.org Artificial Intelligence

Sparse Mixture of Experts (sMoE) has become a pivotal approach for scaling large vision-language models, offering substantial capacity while maintaining computational efficiency through dynamic, sparse activation of experts. However, existing routing mechanisms, typically based on similarity scoring, struggle to effectively capture the underlying input structure. This limitation leads to a trade-off between expert specialization and balanced computation, hindering both scalability and performance. We propose Input Domain Aware MoE, a novel routing framework that leverages a probabilistic mixture model to better partition the input space. By modeling routing probabilities as a mixture of distributions, our method enables experts to develop clear specialization boundaries while achieving balanced utilization. Unlike conventional approaches, our routing mechanism is trained independently of task-specific objectives, allowing for stable optimization and decisive expert assignments. Empirical results on vision-language tasks demonstrate that our method consistently outperforms existing sMoE approaches, achieving higher task performance and improved expert utilization balance.